![]() 11/06/2019 at 22:29 • Filed to: kinja, backup, holyshitholyshitholyshit | ![]() | ![]() |
[EDIT 11/2020]
OP is from 11/2019. Not sure how this got bumped. Since the apocalypse is once again upon us I will look into backups this weekend and do a new post about it then.
!!! UNKNOWN HEADER TYPE (MULTI-LINE BREAK?) !!!
[Original post from 11/06/19 Continues Below]
Even though we’ve apparently been granted a temporary reprieve, archiving your posts seems like... a good idea. After giving the !!!error: Indecipherable SUB-paragraph formatting!!! a try, which worked well but didn’t retain images or comments, I figured I’d try using “Gotham Grabber” based on this tweet:
!!! UNKNOWN CONTENT TYPE !!!
Setup was pretty straightforward and the outputs look great except they also don’t have comments, though I’ve asked Parker to look into it as he has time.
Here is how I installed and ran it in Linux:
Download/ clone Gotham Grabber from !!!error: Indecipherable SUB-paragraph formatting!!!
Make sure you have python 3 installed, I had errors when using python 2
Update all the things (‘ sudo apt-get update && apt-get upgrade ’, or whatever tickles your fancy)
Check your node version by using ‘ node -v ’.
If node is below version 10, go ahead and update it. I !!!error: Indecipherable SUB-paragraph formatting!!!
I had to run ‘ sudo pip3 install —upgrade setuptools ` when I got a weird error, so that might not hurt to do (note the “-” before “upgrade” is meant to be a double dash but Kinja keeps
With all that sorted, navigate to the Gotham Grabber folder (default is “gotham-grabber-master”), in my case it was in my home directory so it was
cd /home/akio/gotham-grabber-master/
Run the command ` npm install ` (Note: this only works if you’re in the Gotham Grabber folder!)
Then run pip3 install -r requirements.txt If you don’t have pip3, you can get it via ` apt-get install python3-pip ` or your favorite package manager.
Make sure you’re NOT running as root, and execute via:
python3 gothamgrabber.py —url https://kinja.com/user
(Note: the “-” before “url” is meant to be a double dash but Kinja keeps changing it.)
It SHOULD then scrape all URLs from your author page and begin converting them to PDF. Finished it looked like this:
It took a couple hours to make all of mine and the finished PDFs were about 1.3gigs in total.
PDFs look like this:
Again, no comments, but hopefully someone can find a workaround for that. Simply re-enabling the comments field (disabled in tweaks/kinja.css) doesn’t really help as it only shows the crappy infiniscroll comment preview.
If you’re cool with no comments and have less than a couple thousand posts, I could run this for a few people (did I mention it takes a while!?) and send you a link to download the PDFs.
I’ll update this post if someone cracks the code on comments getting full comments.
![]() 11/06/2019 at 22:39 |
|
Thanks for doing this. I’ll think carefully before asking you to do anything. Stuff like this is a time suck. But it’s great to see the offer.
![]() 11/06/2019 at 22:49 |
|
Luckily it does run unassisted once you get everything set up and I’d imagine I could run a couple in parallel, so it isn’t too bad . But if I had like 50 people asking it’d be more of an issue of queue management than anything else. If you’d like me to I’d be happy to.
![]() 11/07/2019 at 00:19 |
|
Thanks. Okay then if you’re game I would like to have an archive just in case.
![]() 11/07/2019 at 00:55 |
|
Np. 605 posts. I’ll let it run overnight.
![]() 11/07/2019 at 01:00 |
|
After like an hour of updating shit I’m stuck at step 7. What command in Terminal tells it to open the file via P
ython?
![]() 11/07/2019 at 01:00 |
|
You’re the best.
![]() 11/07/2019 at 01:11 |
|
python3 gothamgrabber.py - u https ://kinja.com/yourusernamehere
Or, rather, “python3" is the command to open a file in Python 3
![]() 11/07/2019 at 01:28 |
|
Okay, I got as far as scraping and now every article fails with th is message:
Making PDF of
https://oppositelock.kinja.com/who-exactly-buys-a-new-jaguar-anyway-1782579556
(1145/2476)
(node:29326) UnhandledPromiseRejectionWarning: Error: Chromium revision is not downloaded. Run “npm install” or “yarn install”
at Launcher.launch (/home/peppermint/gotham-grabber-master/node_modules/puppeteer/lib/Launcher.js:119:15)
at
(node:29326) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:29326) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Running npm install as prompted gets me this:
peppermint@peppermint ~ $ sudo npm install
npm WARN saveError ENOENT: no such file or directory, open ‘/home/peppermint/package.json’
npm WARN enoent ENOENT: no such file or directory, open ‘/home/peppermint/package.json’
npm WARN peppermint No description
npm WARN peppermint No repository field.
npm WARN peppermint No README data
npm WARN peppermint No license field.
up to date in 0.709s
found 0 vulnerabilities
![]() 11/07/2019 at 01:35 |
|
Shiiiiiiit! Sorry that one is on me. I am NOT detail oriented.
Navigate to the gotham-grabber directory and then run `npm install`
After that also run
`pip3 install -r requirements
.txt`
(if you don’t have pip3 you’ll need to `apt-get install python3-pip`)
![]() 11/07/2019 at 01:44 |
|
Same exact result EDIT looks like I had to runn npm install at this folder:
/home/peppermint/gotham-grabber-master/node_modules/puppeteer/lib/
Doing it now, will keep you updated
![]() 11/07/2019 at 01:49 |
|
Based on the errors you posed I think you’re in /home/peppermint/” not “ /home/peppermint/gotham-grabber-master” when you run “ npm install” .
![]() 11/07/2019 at 01:53 |
|
Ok cool. Imma go to bed. If you don’t get it working I can run it for you tomorrow if you’d like.
![]() 11/07/2019 at 01:55 |
|
No, I’m in the right folder. Unfortunately running npm install gets me this:
peppermint@peppermint ~/gotham-grabber-master $ npm install
npm WARN pdfgrabber@1.0.0 No repository field.
removed 164 packages and audited 51 packages in 1.675s
found 0 vulnerabilities
I’ve now gotten a different error message when scraping:
Making PDF of
https://oppositelock.kinja.com/im-playing-a-game-about-bendy-buses-1834797117
(163/2476)
(node:6037) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
/home/peppermint/gotham-grabber-master/node_modules/puppeteer/.local-chromium/linux-686378/chrome-linux/chrome: 1: /home/peppermint/gotham-grabber-master/node_modules/puppeteer/.local-chromium/linux-686378/chrome-linux/chrome: Syntax error: Unterminated quoted string
TROUBLESHOOTING:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose (/home/peppermint/gotham-grabber-master/node_modules/puppeteer/lib/Launcher.js:348:14)
at Interface.helper.addEventListener (/home/peppermint/gotham-grabber-master/node_modules/puppeteer/lib/Launcher.js:337:50)
at emitNone (events.js:111:20)
at Interface.emit (events.js:208:7)
at Interface.close (readline.js:368:8)
at Socket.onend (readline.js:147:10)
at emitNone (events.js:111:20)
at Socket.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1064:12)
at _combinedTickCallback (internal/process/next_tick.js:139:11)
(node:6037) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:6037) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
![]() 11/07/2019 at 02:01 |
|
Huh that is... certainly a thing. Did you also install requirements? I assume it would have complained before then if you hadn’t... Otherwise I don’t really know what to tell you.
I’ll get you going on this computer overnight if that works.
![]() 11/07/2019 at 02:09 |
|
Could this have something to do with me running Linux live off a thumb drive?
![]() 11/07/2019 at 05:31 |
|
That’s an awesome method too!
I might give it a try as well (although I’m pretty happy with the text+images and being able to selectively choose articles in my version build upon Just Jeepin’s code)
![]() 11/07/2019 at 07:52 |
|
execute gotham grabber by pre- pending python3 -i to the command line string. Depending on how your python dev is set up puppeteer can have issues attempting to execute the headless instance of chrome. I think it has something to do with how stdin is getting passed from the python script up through node.js to kick the headless instance of chrome for each pdf conversion.
![]() 11/07/2019 at 09:46 |
|
I roped in a more technically adept friend and apparently I have 4,241 articles across Jalopnik, OppositeLock, The Inventory and who knows what else. Fhwelp!
I think I still have to separately PDF the couple Porsche pieces
I did for Studio @ Gizmodo since those were input separately from my main account
, but it’s all taking a while. Sitting at 1,586 PDFs made of 4,241.
![]() 11/07/2019 at 11:11 |
|
I gave up. Every time I google an error message and update the related thing to fix it , I get stopped by a new message at some other step of the way.
![]() 11/07/2019 at 11:46 |
|
This is all greek to me, but I already used another tool (“Save Page WE” browser extension) to save my posts. It didn’t save comments either, tho ug h. I f somebody does come up with a way to save comments, by golly I will sit down and learn this gobbledygook.
![]() 11/07/2019 at 11:53 |
|
Here ya go. Link expires in 2 weeks.
https://www.dropbox.com/s/duqciyw3rt5rje6/chariotoflove.zip?dl=0
![]() 11/07/2019 at 12:15 |
|
Maybe? Here is a link to your stuff though. Link will expire in 2 weeks.
https://www.dropbox.com/s/l2k04wmqcaqzj2q/essextee.zip?dl=0
![]() 11/07/2019 at 13:45 |
|
Got it! Fantastic!
![]() 11/07/2019 at 14:03 |
|
Thanks a lot , I’ll download it when I get home tomorrow.
![]() 11/07/2019 at 21:11 |
|
The only machine I currently have running linux is a dell laptop from about 2006 (it runs my 3d printer that I never use)
. I do not know if it could handle this task but may have to try.
![]() 11/08/2019 at 11:15 |
|
Here ya go! Unless you really wanted to try haha. Like will expire in two weeks.
https://www.dropbox.com/s/fcvhrmvyyakg1bl/mattlachesky.zip?dl=0
![]() 11/08/2019 at 19:53 |
|
Thanks! I’d have given it a shot but I’m doubtful it would have worked. It also would have been a while before I had time to even check so this is much
appreciated.
![]() 11/11/2019 at 15:49 |
|
I’ve literally just being going through every single post I’ve ever written using Chrome and hitting ctrl+p and then saving them as a PDF. But just scrolling down to where the comments end.
![]() 11/21/2019 at 22:49 |
|
Can I bother someone for saving my posts? they’re not that important, but still. Oppo and just regular ones. I don’t linux...
![]() 11/21/2019 at 23:49 |
|
Yeah let me see if I can get that going again haha.
![]() 11/22/2019 at 00:29 |
|
Done and done. Link will expire in two weeks.
https://www.dropbox.com/s/wo74jg3y3d4ej6r/dogapult.zip?dl=0
![]() 11/06/2020 at 12:07 |
|
For anyone on Mac, you can do:
brew install python or brew upgrade python if your python version is wrong. Every other command will work on you mac local machine.
![]() 11/06/2020 at 12:12 |
|
I know everyone is probably asking but... help for the poor. Coding and time are not my specialty today.
![]() 11/06/2020 at 13:07 |
|
Please, sir, back up my posts ?
![]() 11/06/2020 at 13:18 |
|
Oh god is the Kinja apocalypse
upon us again?
I can check to see if this still works (unlikely, but possible) next week. HMU if you don’t hear back by wednesday.
![]() 11/06/2020 at 13:58 |
|
I’d take a backup if you’ve got the time. I only started posting in May this year, so it should be pretty small. I truly appreciate it!
![]() 11/06/2020 at 16:32 |
|
Thanks for looking into this again. I’ll be checking in on the weekend to download what few things I have authored.
![]() 11/11/2020 at 15:08 |
|
Thanks for posting this. It worked like a charm. I only wish it could have captured the comments as well. Still better than nothing, though. Thanks again!
![]() 11/11/2020 at 15:52 |
|
Glad it worked! Yeah we looked into doing the comments but to no avail.